Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 98913 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 91012 |
| Duplicate rows (%) | 92.0% |
| Total size in memory | 11.6 MiB |
| Average record size in memory | 123.0 B |
Variable types
| Categorical | 1 |
|---|---|
| Numeric | 8 |
| Dataset has 91012 (92.0%) duplicate rows | Duplicates |
socialNbFollowers is highly skewed (γ1 = 88.81691016) | Skewed |
socialNbFollows is highly skewed (γ1 = 220.8766787) | Skewed |
socialProductsLiked is highly skewed (γ1 = 244.1577429) | Skewed |
productsListed is highly skewed (γ1 = 64.89321853) | Skewed |
productsSold is highly skewed (γ1 = 41.59563253) | Skewed |
productsWished is highly skewed (γ1 = 49.25695941) | Skewed |
productsBought is highly skewed (γ1 = 84.79735987) | Skewed |
socialProductsLiked has 82987 (83.9%) zeros | Zeros |
productsListed has 97189 (98.3%) zeros | Zeros |
productsSold has 96877 (97.9%) zeros | Zeros |
productsPassRate has 97979 (99.1%) zeros | Zeros |
productsWished has 89612 (90.6%) zeros | Zeros |
productsBought has 93494 (94.5%) zeros | Zeros |
Reproduction
| Analysis started | 2021-04-01 09:47:08.497880 |
|---|---|
| Analysis finished | 2021-04-01 09:48:04.246175 |
| Duration | 55.75 seconds |
| Software version | pandas-profiling v2.10.1 |
| Download configuration | config.yaml |
language
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| en | |
|---|---|
| fr | |
| it | |
| de | |
| es |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 197826 |
|---|---|
| Distinct characters | 8 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | en |
|---|---|
| 2nd row | en |
| 3rd row | fr |
| 4th row | en |
| 5th row | en |
| Value | Count | Frequency (%) |
| en | 51564 | |
| fr | 26372 | |
| it | 7766 | 7.9% |
| de | 7178 | 7.3% |
| es | 6033 | 6.1% |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| en | 51564 | |
| fr | 26372 | |
| it | 7766 | 7.9% |
| de | 7178 | 7.3% |
| es | 6033 | 6.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 64775 | |
| n | 51564 | |
| f | 26372 | |
| r | 26372 | |
| i | 7766 | 3.9% |
| t | 7766 | 3.9% |
| d | 7178 | 3.6% |
| s | 6033 | 3.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 197826 |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 64775 | |
| n | 51564 | |
| f | 26372 | |
| r | 26372 | |
| i | 7766 | 3.9% |
| t | 7766 | 3.9% |
| d | 7178 | 3.6% |
| s | 6033 | 3.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 197826 |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 64775 | |
| n | 51564 | |
| f | 26372 | |
| r | 26372 | |
| i | 7766 | 3.9% |
| t | 7766 | 3.9% |
| d | 7178 | 3.6% |
| s | 6033 | 3.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 197826 |
Most frequent character per block
| Value | Count | Frequency (%) |
| e | 64775 | |
| n | 51564 | |
| f | 26372 | |
| r | 26372 | |
| i | 7766 | 3.9% |
| t | 7766 | 3.9% |
| d | 7178 | 3.6% |
| s | 6033 | 3.0% |
| Distinct | 90 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.432268761 |
|---|---|
| Minimum | 3 |
| Maximum | 744 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 3 |
| median | 3 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 744 |
| Range | 741 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 3.882383028 |
|---|---|
| Coefficient of variation (CV) | 1.131141906 |
| Kurtosis | 14415.30703 |
| Mean | 3.432268761 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 88.81691016 |
| Sum | 339496 |
| Variance | 15.07289798 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3 | 84939 | |
| 4 | 8219 | 8.3% |
| 5 | 2720 | 2.7% |
| 6 | 813 | 0.8% |
| 7 | 539 | 0.5% |
| 8 | 336 | 0.3% |
| 9 | 235 | 0.2% |
| 10 | 164 | 0.2% |
| 11 | 121 | 0.1% |
| 12 | 99 | 0.1% |
| Other values (80) | 728 | 0.7% |
| Value | Count | Frequency (%) |
| 3 | 84939 | |
| 4 | 8219 | 8.3% |
| 5 | 2720 | 2.7% |
| 6 | 813 | 0.8% |
| 7 | 539 | 0.5% |
| 8 | 336 | 0.3% |
| 9 | 235 | 0.2% |
| 10 | 164 | 0.2% |
| 11 | 121 | 0.1% |
| 12 | 99 | 0.1% |
| Value | Count | Frequency (%) |
| 744 | 1 | |
| 353 | 1 | |
| 205 | 1 | |
| 176 | 1 | |
| 172 | 1 | |
| 167 | 2 | |
| 147 | 1 | |
| 137 | 1 | |
| 131 | 1 | |
| 130 | 1 |
| Distinct | 85 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.42567711 |
|---|---|
| Minimum | 0 |
| Maximum | 13764 |
| Zeros | 39 |
| Zeros (%) | < 0.1% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 8 |
| median | 8 |
| Q3 | 8 |
| 95-th percentile | 8 |
| Maximum | 13764 |
| Range | 13764 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 52.83957192 |
|---|---|
| Coefficient of variation (CV) | 6.271255262 |
| Kurtosis | 52718.3891 |
| Mean | 8.42567711 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 220.8766787 |
| Sum | 833409 |
| Variance | 2792.02036 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8 | 94893 | |
| 9 | 2386 | 2.4% |
| 10 | 618 | 0.6% |
| 11 | 260 | 0.3% |
| 12 | 148 | 0.1% |
| 13 | 94 | 0.1% |
| 15 | 55 | 0.1% |
| 14 | 53 | 0.1% |
| 7 | 52 | 0.1% |
| 0 | 39 | < 0.1% |
| Other values (75) | 315 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 39 | < 0.1% |
| 1 | 5 | < 0.1% |
| 2 | 8 | < 0.1% |
| 3 | 6 | < 0.1% |
| 4 | 11 | < 0.1% |
| 5 | 11 | < 0.1% |
| 6 | 7 | < 0.1% |
| 7 | 52 | 0.1% |
| 8 | 94893 | |
| 9 | 2386 | 2.4% |
| Value | Count | Frequency (%) |
| 13764 | 1 | |
| 8268 | 1 | |
| 3649 | 1 | |
| 2013 | 1 | |
| 500 | 1 | |
| 482 | 1 | |
| 450 | 1 | |
| 431 | 1 | |
| 421 | 1 | |
| 209 | 1 |
| Distinct | 420 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.420743482 |
|---|---|
| Minimum | 0 |
| Maximum | 51671 |
| Zeros | 82987 |
| Zeros (%) | 83.9% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 8 |
| Maximum | 51671 |
| Range | 51671 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 181.0305695 |
|---|---|
| Coefficient of variation (CV) | 40.95025423 |
| Kurtosis | 67765.24122 |
| Mean | 4.420743482 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 244.1577429 |
| Sum | 437269 |
| Variance | 32772.06708 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 82987 | |
| 1 | 5261 | 5.3% |
| 2 | 1898 | 1.9% |
| 3 | 1215 | 1.2% |
| 4 | 973 | 1.0% |
| 5 | 644 | 0.7% |
| 6 | 532 | 0.5% |
| 7 | 436 | 0.4% |
| 8 | 359 | 0.4% |
| 9 | 316 | 0.3% |
| Other values (410) | 4292 | 4.3% |
| Value | Count | Frequency (%) |
| 0 | 82987 | |
| 1 | 5261 | 5.3% |
| 2 | 1898 | 1.9% |
| 3 | 1215 | 1.2% |
| 4 | 973 | 1.0% |
| 5 | 644 | 0.7% |
| 6 | 532 | 0.5% |
| 7 | 436 | 0.4% |
| 8 | 359 | 0.4% |
| 9 | 316 | 0.3% |
| Value | Count | Frequency (%) |
| 51671 | 1 | |
| 16040 | 1 | |
| 7044 | 1 | |
| 5979 | 1 | |
| 5598 | 1 | |
| 5595 | 1 | |
| 5109 | 1 | |
| 3037 | 1 | |
| 2942 | 1 | |
| 2823 | 1 |
| Distinct | 65 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.09330421684 |
|---|---|
| Minimum | 0 |
| Maximum | 244 |
| Zeros | 97189 |
| Zeros (%) | 98.3% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 244 |
| Range | 244 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.050143546 |
|---|---|
| Coefficient of variation (CV) | 21.97267835 |
| Kurtosis | 5760.301256 |
| Mean | 0.09330421684 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 64.89321853 |
| Sum | 9229 |
| Variance | 4.203088557 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 97189 | |
| 1 | 808 | 0.8% |
| 2 | 278 | 0.3% |
| 3 | 150 | 0.2% |
| 4 | 98 | 0.1% |
| 5 | 62 | 0.1% |
| 6 | 45 | < 0.1% |
| 7 | 40 | < 0.1% |
| 8 | 29 | < 0.1% |
| 10 | 22 | < 0.1% |
| Other values (55) | 192 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 97189 | |
| 1 | 808 | 0.8% |
| 2 | 278 | 0.3% |
| 3 | 150 | 0.2% |
| 4 | 98 | 0.1% |
| 5 | 62 | 0.1% |
| 6 | 45 | < 0.1% |
| 7 | 40 | < 0.1% |
| 8 | 29 | < 0.1% |
| 9 | 20 | < 0.1% |
| Value | Count | Frequency (%) |
| 244 | 1 | |
| 217 | 1 | |
| 202 | 1 | |
| 185 | 1 | |
| 123 | 1 | |
| 122 | 1 | |
| 117 | 2 | |
| 113 | 1 | |
| 102 | 1 | |
| 96 | 1 |
| Distinct | 75 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1215917018 |
|---|---|
| Minimum | 0 |
| Maximum | 174 |
| Zeros | 96877 |
| Zeros (%) | 97.9% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 174 |
| Range | 174 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.126895354 |
|---|---|
| Coefficient of variation (CV) | 17.49210943 |
| Kurtosis | 2355.673441 |
| Mean | 0.1215917018 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 41.59563253 |
| Sum | 12027 |
| Variance | 4.523683846 |
| Monotocity | Decreasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 96877 | |
| 1 | 917 | 0.9% |
| 2 | 325 | 0.3% |
| 3 | 154 | 0.2% |
| 4 | 124 | 0.1% |
| 6 | 58 | 0.1% |
| 5 | 58 | 0.1% |
| 7 | 45 | < 0.1% |
| 9 | 42 | < 0.1% |
| 8 | 31 | < 0.1% |
| Other values (65) | 282 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 96877 | |
| 1 | 917 | 0.9% |
| 2 | 325 | 0.3% |
| 3 | 154 | 0.2% |
| 4 | 124 | 0.1% |
| 5 | 58 | 0.1% |
| 6 | 58 | 0.1% |
| 7 | 45 | < 0.1% |
| 8 | 31 | < 0.1% |
| 9 | 42 | < 0.1% |
| Value | Count | Frequency (%) |
| 174 | 1 | |
| 170 | 1 | |
| 163 | 1 | |
| 152 | 1 | |
| 125 | 1 | |
| 123 | 1 | |
| 108 | 1 | |
| 106 | 1 | |
| 104 | 1 | |
| 92 | 1 |
| Distinct | 72 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.8123027307 |
|---|---|
| Minimum | 0 |
| Maximum | 100 |
| Zeros | 97979 |
| Zeros (%) | 99.1% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 8.500205194 |
|---|---|
| Coefficient of variation (CV) | 10.46433167 |
| Kurtosis | 114.0391218 |
| Mean | 0.8123027307 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 10.66729865 |
| Sum | 80347.3 |
| Variance | 72.25348834 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 97979 | |
| 100 | 441 | 0.4% |
| 66 | 63 | 0.1% |
| 50 | 57 | 0.1% |
| 75 | 42 | < 0.1% |
| 83 | 25 | < 0.1% |
| 90 | 25 | < 0.1% |
| 80 | 22 | < 0.1% |
| 85 | 20 | < 0.1% |
| 60 | 16 | < 0.1% |
| Other values (62) | 223 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 97979 | |
| 25 | 5 | < 0.1% |
| 28 | 2 | < 0.1% |
| 31 | 1 | < 0.1% |
| 33 | 8 | < 0.1% |
| 35 | 1 | < 0.1% |
| 37 | 2 | < 0.1% |
| 40 | 2 | < 0.1% |
| 41.6 | 1 | < 0.1% |
| 42 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 100 | 441 | |
| 99 | 1 | < 0.1% |
| 98.7 | 1 | < 0.1% |
| 98 | 8 | < 0.1% |
| 96.4 | 1 | < 0.1% |
| 96.2 | 1 | < 0.1% |
| 96 | 5 | < 0.1% |
| 95 | 5 | < 0.1% |
| 94 | 8 | < 0.1% |
| 93 | 12 | < 0.1% |
| Distinct | 279 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.562595412 |
|---|---|
| Minimum | 0 |
| Maximum | 2635 |
| Zeros | 89612 |
| Zeros (%) | 90.6% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 2635 |
| Range | 2635 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 25.19279323 |
|---|---|
| Coefficient of variation (CV) | 16.12240317 |
| Kurtosis | 3369.163069 |
| Mean | 1.562595412 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 49.25695941 |
| Sum | 154561 |
| Variance | 634.6768308 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 89612 | |
| 1 | 3375 | 3.4% |
| 2 | 1339 | 1.4% |
| 3 | 797 | 0.8% |
| 4 | 526 | 0.5% |
| 5 | 406 | 0.4% |
| 6 | 299 | 0.3% |
| 7 | 252 | 0.3% |
| 8 | 176 | 0.2% |
| 9 | 158 | 0.2% |
| Other values (269) | 1973 | 2.0% |
| Value | Count | Frequency (%) |
| 0 | 89612 | |
| 1 | 3375 | 3.4% |
| 2 | 1339 | 1.4% |
| 3 | 797 | 0.8% |
| 4 | 526 | 0.5% |
| 5 | 406 | 0.4% |
| 6 | 299 | 0.3% |
| 7 | 252 | 0.3% |
| 8 | 176 | 0.2% |
| 9 | 158 | 0.2% |
| Value | Count | Frequency (%) |
| 2635 | 1 | |
| 1916 | 1 | |
| 1900 | 1 | |
| 1842 | 1 | |
| 1820 | 1 | |
| 1783 | 1 | |
| 1622 | 1 | |
| 1295 | 1 | |
| 1225 | 1 | |
| 1113 | 1 |
| Distinct | 70 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1719288668 |
|---|---|
| Minimum | 0 |
| Maximum | 405 |
| Zeros | 93494 |
| Zeros (%) | 94.5% |
| Memory size | 772.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 405 |
| Range | 405 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.332265666 |
|---|---|
| Coefficient of variation (CV) | 13.56529424 |
| Kurtosis | 11871.75975 |
| Mean | 0.1719288668 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 84.79735987 |
| Sum | 17006 |
| Variance | 5.439463136 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 93494 | |
| 1 | 3297 | 3.3% |
| 2 | 845 | 0.9% |
| 3 | 364 | 0.4% |
| 4 | 214 | 0.2% |
| 5 | 139 | 0.1% |
| 6 | 108 | 0.1% |
| 7 | 65 | 0.1% |
| 8 | 52 | 0.1% |
| 9 | 40 | < 0.1% |
| Other values (60) | 295 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 93494 | |
| 1 | 3297 | 3.3% |
| 2 | 845 | 0.9% |
| 3 | 364 | 0.4% |
| 4 | 214 | 0.2% |
| 5 | 139 | 0.1% |
| 6 | 108 | 0.1% |
| 7 | 65 | 0.1% |
| 8 | 52 | 0.1% |
| 9 | 40 | < 0.1% |
| Value | Count | Frequency (%) |
| 405 | 1 | |
| 279 | 1 | |
| 174 | 1 | |
| 115 | 1 | |
| 105 | 1 | |
| 93 | 1 | |
| 87 | 1 | |
| 85 | 1 | |
| 81 | 1 | |
| 80 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| language | socialNbFollowers | socialNbFollows | socialProductsLiked | productsListed | productsSold | productsPassRate | productsWished | productsBought | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | en | 147 | 10 | 77 | 26 | 174 | 74.0 | 104 | 1 |
| 1 | en | 167 | 8 | 2 | 19 | 170 | 99.0 | 0 | 0 |
| 2 | fr | 137 | 13 | 60 | 33 | 163 | 94.0 | 10 | 3 |
| 3 | en | 131 | 10 | 14 | 122 | 152 | 92.0 | 7 | 0 |
| 4 | en | 167 | 8 | 0 | 25 | 125 | 100.0 | 0 | 0 |
| 5 | de | 130 | 12 | 1 | 47 | 123 | 91.0 | 0 | 0 |
| 6 | en | 121 | 0 | 1140 | 31 | 108 | 94.0 | 531 | 105 |
| 7 | fr | 53 | 9 | 3 | 5 | 106 | 98.0 | 0 | 0 |
| 8 | it | 744 | 13764 | 51671 | 0 | 104 | 85.0 | 1842 | 0 |
| 9 | en | 57 | 8 | 45 | 123 | 92 | 74.0 | 6 | 2 |
Last rows
| language | socialNbFollowers | socialNbFollows | socialProductsLiked | productsListed | productsSold | productsPassRate | productsWished | productsBought | |
|---|---|---|---|---|---|---|---|---|---|
| 98903 | es | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98904 | en | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98905 | en | 3 | 8 | 6 | 0 | 0 | 0.0 | 0 | 0 |
| 98906 | en | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98907 | en | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98908 | fr | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98909 | fr | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98910 | en | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98911 | it | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
| 98912 | fr | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 |
Most frequent
| language | socialNbFollowers | socialNbFollows | socialProductsLiked | productsListed | productsSold | productsPassRate | productsWished | productsBought | count | |
|---|---|---|---|---|---|---|---|---|---|---|
| 144 | en | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 | 36601 |
| 817 | fr | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 | 18896 |
| 1135 | it | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 | 5266 |
| 0 | de | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 | 4690 |
| 734 | es | 3 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 | 4527 |
| 488 | en | 4 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 | 2576 |
| 190 | en | 3 | 8 | 1 | 0 | 0 | 0.0 | 0 | 0 | 1552 |
| 998 | fr | 4 | 8 | 0 | 0 | 0 | 0.0 | 0 | 0 | 1534 |
| 856 | fr | 3 | 8 | 1 | 0 | 0 | 0.0 | 0 | 0 | 739 |
| 151 | en | 3 | 8 | 0 | 0 | 0 | 0.0 | 1 | 0 | 649 |